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ABSTRACT 

The  nearly  nonstationary  first  order  autoregression  is  a  sequence  of  processes 
where  the  autoregressive  coefficient  tends  to  1  as  n  M-estimates  of  the 

autoregressive  coefficient  are  considered.  The  process  is  allowed  to  be  nongaussian, 
but  a  2  +  8  moment  condition  is  assumed.  The  limiting  distribution  is  not  the  usual 
normal  limit  but  is  characterized  as  a  ratio  of  two  stochastic  integrals.  The 
asymptotically  most  efficient  M-estimate  is  not  given  by  maximum  likelihood. 
However,  it  is  shown  that  the  loss  of  efficiency  in  using  maximum  likelihood  is  no 
worse  than  about  20%,  whereas  the  usual  least  squares  estimator  can  have  arbitrarily 
low  efficiency. 
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1.  Introduction 

The  aim  of  this  work  is  to  study  asymptotic  properties  of  M-estimators  of  the 
autoregressive  parameter  $  of  a  nearly  non-stationary  first  order  autoregressive  process, 
and  to  obtain  efficient  M-estimators  of  0.  We  consider  the  sequence  (>„(*) :  Os*  sn  )7., 
of  first  order  autoregressive  AR(1)  processes 

>.(*)  =  ♦.?.(*-  D  +  e(*)  (1.1) 

where  we  assume  {e(/t))*“  —  is  a  sequence  of  iid  random  variables  with  mean  zero  and 
finite  (2+8)-moment,  for  some  positive  8,  and  0„  is  allowed  to  vary  with  « .  Specifically, 
we  will  assume 

=  (1-2) 

fl 

for  some  ji  >  0,  so  that  y„  tends  to  look  like  a  non-stationary  random  walk  for  large  n . 
Also  we  will  assume  that  we  have  some  knowledge  on  the  starting  value  y.(0),  either  by 
considering  it  as  a  constant  or  by  assuming  is  a  random  variable  with  known  distribu¬ 
tion.  In  principle  we  are  interested  in  the  asymptotic  behavior  of  estimators  of  the 
form: 

0.  =  arg min  £p(y,(*  +  l) -♦?.(*))  (1.3) 

♦  t-i 

for  some  function  p.  Here,  arg  min  denotes  the  value  of  0  where  a  minimum  is  achieved. 

For  example,  taking  p(«)  =  u 2  equation  (1.3)  gives  the  least  squares  estimator,  LSE,  of  0. 

It  is  known  that  the  LSE  of  e,  for  fixed  ■♦with  ipi  <  l  is  asymptotically  normal 
N( 0,1  -tf),  but  when  0=1  the  LSE  is  and  the  normal  approximation  fails  (see 

eg.  Fuller  (1976),  section  8.5).  White  (1958)  was  able  to  represent  the  asymptotic 
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Sr. 

A 

V. 

A 

h 


distribution  of  the  estimation  error  when  (/.«.  p  =  0  in  (1.2))  as 

f  W(s)dW(s) 

n  (+.  -1)  =>  ^ - 

I  W\s)ds 

where  W  denotes  a  standard  Brownian  Motion  process  and  =>  denotes  convergence  in 
distribution.  Rao  (1978),  Dickey  and  Fuller  (1979),  and  Evans  and  Savin  (1981)  have 
obtained  representations  of  this  limiting  distribution.  For  the  nearly  non-stationary 
( NNS )  model  of  equation  (1.1),  Cumberland  and  Sykes  (1982)  found  that  the  normalized 
processes  «-*?„([«])  converges  in  distribution  to  an  Omstein-Uhlenbeck  process 
defined  by  the  Itfi’s  Stochastic  Differential  Equation  (SDE) 

dY(t)  =  -VY(t)dt+odW(t).  (1.4) 

Bobkoski  (1983)  independently  proved  the  latter  result  and  based  on  this  convergence 
obtained 

i 

f  Y(s)dW{s) 

»($»-$,.)  =  — -  (1-5) 

J  y2(s)ds 
0 

where  is  given  by  (1.2).  Chan  and  Wei  (1985)  obtained  similar  results  for  the  NNS 
model  and  found  that  when  the  parameter  3  goes  to  infinite  the  asymptotic  distribution 


m -1 


of  the  V  statistic"  [£  yA*  )!■*(♦,  -0.)  is  standard  normal,  which  is  in  agreement  with 


ft-1 


intuition,  since  for  large  p  it  takes  longer  for  the  non-stationary  behavior  to  manifest 
itself. 
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In  this  work  we  obtain  the  weak  limit  of  the  M-estimator  when  =  1  -  |S/n .  Martin 
and  Jong  (1977)  showed  that  the  (generalized)  M-estimator  is  asymptotically  normal 
when  <(>„  ■$  with  l$l  <  1;  specifically  it  follows  from  the  work  of  these  authors  that 
under  standard  regularity  conditions  (e.g  (2.A)  and  (2.B)  below) 

»*($.-4>)  =>  N( 0,(1 -^v,) 


where 


[E  V(e( m2 


v(“)  = 


dp (u)  . 
du  ' 


V(m)  = 


dy(.u) 

du 


A  simple  variational  argument  will  show  the  most  "efficient"  M-estimator  (the  one 
minimizing  v,  )  is  obtained  from  p  =  -  log(/  )  where  /  is  the  density  of  the  e's,  i.e.  when  $ 
is  the  maximum  likelihood  estimator,  MLE,  conditioned  on  the  initial  value  ym  (0).  Other 
efficiency  results  for  the  stationary  AR(1)  process  when  the  errors  arc  not  normal  can 
be  found  in  Johnson  and  Akritas  (1982).  For  the  nearly  non-stationary  model  where 
is  given  by  (1.2),  a  similar  calculation  based  on  the  limit  theorems  presented  here  indi¬ 
cates  that  the  MLE  will  generally  not  be  the  most  "efficient"  M-estimator.  Indeed,  the 
function  which  works  "best"  is  a  linear  combination  of  the  LSE  and  MLE  criterion  func¬ 
tions. 

The  asymptotic  results  that  we  present  in  this  work  deal  with  convergence  in  dis¬ 
tribution  of  a  sequence  of  stochastic  processes  with  sample  paths  in  D,.  [O.r],  the  space 
of  flf'-valued  functions  defined  on  (O.n  such  that  they  are  right  continuous  and  the  left 
limits  exists,  to  a  process  with  sample  paths  in  C^fO.r],  the  space  of  continuous  tRd- 


valued  functions  on  [0.7-].  The  sequence  of  processes  we  investigate  here  are  solutions 
of  stochastic  difference  equations;  in  a  natural  way  one  might  expect  that  if  the  differ¬ 
ence  equation  "converges"  in  some  sense  to  a  (stochastic)  differential  equation  then  the 
solutions  of  these  equations  would  be  "near"  each  other. 

We  base  our  proofs  on  the  Stroock  and  Varadhan  characterization  of  the  solution 
of  a  SDE  as  the  solution  of  an  associated  martingale  problem.  For  a  detailed  account 
see  e.g  Ethier  and  Kurtz  (1986),  section  5.3,  or  Stroock  and  Varadhan  (1979)  Chapter  6. 
We  obtain  the  asymptotic  results  of  later  Sections  from  the  following  Diffusion 
Approximation  Theorem  due  to  Ethier  and  Kurtz. 

Theorem  1 :  (7.4.1  Ethier  and  Kurtz  (1986)) 

Let  a  be  a  continuous,  symmetric,  nonnegative  definite  dxd matrix-valued 

function  on  IRd  and  b  :  IR *  -*md  be  continuous.  Let  A  be  the  second  order  differential 
operator  on  Ce~(IRd)  given  by 

*/  =  /  e  C?(IRd) 

1  i«l /■!  i«l 

and  suppose  the  Ca,[0  ,")-martingale  problem  for  A  is  well-posed. 

For  n  =  l  .2,  •  • ,  let  X.  and  B„  be  processes  with  sample  paths  in  Dm. [0,»)  and  let 
A„  =  {(A'/))  be  a  symmetric  dxd  -matrix  valued  process  such  that  Aj(  has  sample  paths  in 
Dm[ O.oo)  and  An(t)  -  ak (s )  is  nonnegative  definite  for  t>stO.  Set 


Let  x'm  =  inf{t:  lX„(f)l  £  r  or  1  (/~)  I  2  r )  and  suppose 


Mi  ML -Ai!  i  .7=1.2 . d 


are  local  [F?) -martingales,  and  that  for  each  r  >  0  and  T  >  0 . 


lim  E  sup  IX.(/)-X,(OI2  =0 

l  irmn(T,t^ ) 


lim  £  sup  IB.(0-B,(r)l2  =0 

t  Smin(r.i') 


lim  £  sup  \AL'(t)-AL'(0\  =0 

«-»»  limin(T.t') 


(U0) 


sup  IB'CO- I  -»0 

iSminCr.t')  0 


d.ll) 


«  P 

sup  l^(t)-ffl1>(X(,(s))ds  I ->0 

/Smm(r.t')  0 


(1.12) 


Suppose  that  X,(0)  converges  weakly  to  a  random  variable  with  distribution  v,  r/ien 
(X, )  converges  in  distribution  to  the  solution  of  the  martingale  problem  for  ( A  ,v)  □. 


Remark:  By  the  representation  mentioned  before  the  limiting  process  corresponds  to 
the  diffusion  with  infinitesimal  generator  given  by  A . 

The  rest  of  the  paper  is  organized  as  follows:  In  Section  2  we  formalize  our  prob¬ 
lem  and  state  the  asymptotic  theorem.  In  Section  3  we  derive  an  expression  for  the 
asymptotic  mean  squared  error,  MSE,  and  find  the  form  of  an  optimal  M-estimator. 


Next,  we  compare  the  MSE  ot  the  LSE  and  conditional  MLE  versus  the  asymptotic  USE 
of  the  optimal  M-estimator.  In  Section  4  we  show  some  results  needed  for  the  proof  of 
the  asymptotic  theorem  and  give  the  proof. 

2.  Statement  of  the  Main  Theorem 

Assume  p  in  (1.3)  is  differentiable  and  set  y  =  p  as  before.  Also  assume  that  the 
following  statements  for  the  y  function  hold: 

(2.A) 

y  is  continuously  differentiable  and  satisfies  the  second  order  Lipschitz  condition 
V(/ )  -  y«o)  -  (t  -  <o)V('o)  -C(t  -t0)2a(t  Jo)  (2.1) 

where  C  is  a  positive  constant  and  !a(/  ,/q)I <1. 

(2.B) 

The  (2+8)  order  moments  of  e(l),  y(e(l))  and  y(e(l))  are  finite  for  some  positive  8. 

(2.C) 

E  y(e(l))  =  0  and  £  y(e(l))  =  l.  The  assumption  £  y(e(l))  =  l  involves  no  loss  of  gen¬ 
erality  provided  £  y(e(l))  *  0. 

Now,  for  to  be  a  solution  of  (1.3),  it  is  necessary  that 

m)  -  ly.(*)v(y.(ife  +  i)-^y).(t))  =  0  (2.2) 

*•1 


Hence  if  we  let 


t  =y»(*  +  l)-$.y).(*)  and 
h  =  >.,(*  + 1)  -  $«y .(*)  =  e(*  + 1) 
in  (2.1),  equation  (2.2)  becomes,  with  a(*)  =  ct(r  .to), 

lfy.(k)¥(e(*  +  D)1  -  (k-KYLyfr) 

*»iL  J  *«i 

-  (♦.  -  <t>. )' I  [  y?(k  X  v(e(*  + l)) - 1  )1 

+  («.-<t>.)2C  lL5(t)a(i)l  =0 
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(2.3) 


(2.4) 


W. 

J*  w  * 

i  *  * 

i 

Kv. 


y> 


The  main  result  in  this  paper  is  summarized  by  the  following  theorem. 

Theorem  2  :  Suppose  assumptions  (2.A)  to  (2.C)  hold.  Let  =  p in  with  p  a  posi- 

d  - 

tive  real  constant.  Then  under  the  model  (1.1)  with  yH(fl)  £(-/): 

i*) 

(a)  There  exists  a  sequence  {$„ )  of  solutions  of  equation  (2.2)  such  that 

(k~*n)  =  Op(n-1)  (2.5) 

(b)  For  such  a  sequence 


f  Y(s)dW*s) 

*($«-$«)  =*  — ; -  (2.6) 

j  Y\s)ds 
o 

where  Y{t)  is  the  Ornstein-Uhlenbeck  process  defined  by  the  stochastic  differential 
equation 


A  *7.  ti'.flii  cA  tVt a  ^  uvl  a  >• .%  .•«  ■>  a.viva  -•-'iv.v 


dY(t)  =  -$Y(t)dt  +dWi(t) 


o  ,  W 

Y(0).N(0,zL), 

and  (Vy,(/),W2(f ))'  is  a  two  dimensional  Brownian  motion  with 

E[W,2(t)]  =  t£[e2(  1)], 

E[Wi  (/)]  =  »  £[^(£(1))], 

£[W1(OW20)]  =  <£[e(l)v(e(l))]  □ 


Remark:  Implicitly  stated  in  the  assumed  initial  condition  for  the  sequence  of 
AR(1)  processes  is  the  assumption  that  for  each  n,  the  process  is  stationary.  Thus  is  not 
surprising  that  the  initial  condition  for  the  Omstein-Uhlenbeck  process  of  equation  (2.7) 
is  the  one  needed  to  insure  the  stationarity  of  such  a  process  (Arnold  (1974),  page  135). 

The  weak  limit  in  (2.6)  is  suggested  by  neglecting  the  last  two  terms  of  the  RHS  of 
(2.4),  so  that 

lL(*)V(e(*+D) 

«($<»-  0») =  - (2.8) 

n_1  Iy„2(*) 

*= i 

Define 

T#)  =  (E(*),V(e(*)),v(e(* ))-!)' 


and  let  I  be  the  variance-covariance  matrix  of  the  random  vector  r|(i).  Now  we  define 
the  stochastic  processes  Y„(i)  and  W„(0  for  t  in  [0,1]  by 


w<l(0=[w'u(/).^i.(0,^(0] 


(2.10) 


*= i 

(with  the  usual  convention  that  summation  equals  zero  when  the  upper  limit  is  smaller 
than  the  lower).  The  W3jl  component  does  not  appear  in  the  limiting  distribution  but  is 
used  in  the  proof. 

Let  A  be  the  usual  forward  difference  operator  (i.e.  Am(k)  =  m(k+\)-m(k ))  and 
At  =  n-1.  Then  (2.8)  can  be  written  as 

2  y-(A)A^(-) 

n{^-^)=k<  ■■■ - (2.11) 

iy,2(A)A/ 

*=i  n 

Let  W(0=  ^i0),W,2(/),W'3(oJ '  be  a  three-dimensional  Brownian  motion  such  that 

variance-covariance  matrix  of  the  random  vector  W(i)  is  1 1.  It  can  be  proven  by  means 
of  the  Martingale  Central  Limit  Theorem  (see  e.g.  Ethier  and  Kurtz  (1986),  section  7.1) 
that  the  process  W,  defined  in  (2.10)  converges  weakly  to  W.  Since  y„  converges  to  Y 
(see  Cumberland  and  Sykes  (1982))  it  is  natural  to  think  of  the  summations  in  (2.1 1)  as 
the  Riemann-Stieltjes  sums  for  the  integrals  in  (2.6),  and  we  will  show  in  Theorem  3 
below,  among  other  things,  that  the  two  summations  in  (2.11)  jointly  converge  to  the 
corresponding  integrals  in  (2.6). 


3.  Optimality 


We  now  explore  the  optimality  of  the  M-estimators  under  a  natural  criterion.  Our 
approach  is  to  minimize  an  asymptotic  mean  squared  error 


Surprisingly,  we  have  found  that  this  criterion  leads  to  the  finding  that  the  optimal  y 
function  is  a  linear  combination  of  ti,(x)  =  x  and  r\i(x)  ~  -If1  f  (x)  I  f  (x),  where  /  is  the 
probability  density  function  of  the  innovations  (assuming  it  exists)  and  If  is  the  Fisher 
information  of  the  location  parameter  problem  for  the  common  distribution  of  the  noise. 
Note  that  it,  corresponds  to  the  least  squares  score  function  while  %  is  proportional  to 
the  usual  score  function  of  the  MLE.  The  y  function  so  obtained  is  not  directly  useful  as 
an  estimator  since  the  coefficients  of  the  linear  combination  depend  on  the  unknown 
parameter  p.  Nonetheless,  it  does  immediately  suggest  a  two  stage  procedure  that  may 
be  useful.  The  first  stage  is  to  estimate  by  say  the  MLE  and  hence  p  by 

fc.  urp  =  n  (1  One  can  then  find  the  optimal  y  function  for  the  estimate  0  and  the 

second  stage  consists  of  finding  the  solution  of  the  M-estimation  equation  for  this  y. 

To  prove  the  claim  we  can  think  of  Q  as  a  functional  on 
L2(/ )  =  { ^ :  J %\x)f(x)dx  <  °°) .  We  would  like  to  find  the  minimizer  of  Q  on  L 2(J )  subject 
to  the  constraints  in  (2.B),  i.e.  \%(x)f(x)dx  =0  and  J4(x)/(x)<ix  =  1.  We  have  shown  in 
the  Appendix  that  Q  can  be  written  as 


C=G(V)  =  £ 


& 

v 


<2  (V)  =  Cov2[c(l)  ,y(e(l))j  +  L2Var\y{t{  1))] 


where 


■ 1  1  i 2  r 1  i _1 

LX  =  E  |  YdWj^ds  and  L2  =  E  jj2ds  (3.3) 

Hence  Q  is  a  positive  definite  quadratic  functional  and  since  the  constraints  are  linear, 
the  solution  to  the  minimization  problem  is  obtained  by  setting  the  first  variation  (with 
respect  to  y)  of  the  Lagrangian 

Q  (V)  +  KE  (V(ed)))  +  X*[e  (V(e(l)))  -  l]  = 

(x)dx ]  +  L2jy2(x)f  (x)dx 

+  Xijv(x)/  (x)dx  +  Xj^Jytx)/  (x)dx  -  lj 

equal  to  zero,  and  choosing  the  multipliers  X[  and  X*  so  that  the  constraints  hold.  This 
operation  followed  by  an  integration  by  parts  leads  to  the  equation 

[ 2 o'2(L i -L 2)Jy \|/( y )f  (y )dy j  xf(x)  +  2L 2y  (x )f(x)  +  \J (x)-X2f(x)  =  0 


whence 


y(x)  =  Kx  + 


^2  f{x)  _  ^1 
2L2  fix)  2 L2 


where 


Li  ~Li 

k=  — - - Covty.e) 

(Jt-2 


where  y  and  e  are  shorthand  for  y(e(i))  and  e(l)  respectively.  It  is  easy  to  see  that  both 


£  (e)  =  0  and  the  constraint  £  (y)  =  0  imply  Xj  =  0,  under  the  usual  regularity  conditions  on 
/  that  allow  the  interchange  of  the  integral  and  derivative.  Thus  the  optimal  y  is  a 
linear  combination  of  the  least  squares  and  maximum  likelihood  criterion  functions. 


Also  the  constraint  £  (y)  =  l  implies 


^2  t 

2^/,-Oc-1) 


Substitution  of  the  value  of  the  multipliers  into  (3.4)  gives 

y(x)  =  Kx  +  (k-1)//-’ 

Calculating  Cov(y  ,e)  for  y  in  equation  (3.5)  gives  that 

u_  Li-L, 

Ij-Irfl-O2/,)' 

Plugging  this  value  in  the  definition  of  y  gives 


y(*)  = 


(L2-L0x-<tL,j± 

Lj-L.O-CJ2/,) 


One  should  note  that  y  depends  on  [J  through  Lx  and  Lv  Further,  evaluation  of  L,  and  L2 
is  nontrivial  since  they  are  expectations  of  rational  functions  of  random  integrals  whose 
distribution  is  nontrivial  to  describe.  Now,  it  is  easy  to  check  that  if  £/  and  L{  are  the 
corresponding  moments  when  the  variance  of  the  Brownian  motion  driving  Y  is  equal 
to  one,  then  L ,  =  L //ct2  and  Lt  =  L2/62,  so  it  is  enough  to  obtain  L  /  and  L{.  Following  the 
procedure  in  Williams  (1942)  one  can  obtain  the  moments  of  the  ratio  of  powers  of  the 
numerator  (to  be  denoted  by  N)  and  denominator  (to  be  denoted  by  D)  of  the  ratio  on 
the  R1IS  of  equation  (1.5)  from  the  joint  moment  generating  function  of  N  and  D .  Thus, 


for  example,  if  A(s0,s)=E  [exp{-j0D  -jN)]  then 


(3.7) 


and 


j  /  A(soJ)  l,^ds0d/=]jE  ds0dt  =  E 


(3.8) 


These  formal  manipulations  will  be  valid  as  long  as  the  interchange  of  differentiation 
and  integration  are  valid.  From  equation  (4.20),  Bobkoski  (1983)  we  have  that  the  joint 
MGF  of  N  and  D,  when  y(0)=0  is  given  by 


where 


A (s0,s)  =  £(exp(-SoD  -sN)) 


cosh(z)  +  (P  +  i)shnc(z) 


-'A 


(3.9) 


i  =((32  +  2|3j  +2s0)w  and  shnc (z)= 

Expressions  for  the  MGF  when  the  initial  distribution  is  known  are  available  (Llatas 
(1987)).  The  choice  of  !K(0)=0  is  motivated  by  the  convenience  of  checking  the  results 
obtained  by  numerical  integration  with  both  simulations  and  the  approximated  moments 
obtained  by  numerical  integration  of  the  explicit  form  of  the  asymptotic  limiting  den¬ 
sity  function  obtained  by  Bobkoski  in  this  special  case.  The  fact  that  A  in  (3.9)  is  dif¬ 
ferentiable  and  that  the  terms  of  these  derivatives  will  be  eventually  dominated  by  t  Kt\ 
where  K  is  a  positive  constant,  as  s0  -> «  allow  us  to  interchange  the  order  of  the  integra¬ 
tion  and  differentiation  in  both  (3.7)  and  (3.8)  by  application  of  the  dominated 
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$ 


9 


•ft 

j? 

\k 


convergence  theorem  and  Fubini-Tonelli  theorem.  In  Table  I  we  exhibit  some  of  the 
values  of  L{  and  L{  calculated  using  the  integration  subroutine  DQAGI  in  QUAD- 
PACK. 

Table  I:  Values  of  and  L{  obtained  by  numerical  integration 

Values  of  £,'  and  L{ 


0 

l2' 

0.200 

13.698232 

5.921848 

0.400 

14.104907 

6.285748 

0.600 

14.507015 

6.653889 

0.800 

14.905686 

7.025686 

1.000 

15.301856 

7.400631 

2.000 

17.266291 

9.309338 

3.000 

19.228876 

11.252599 

4.000 

21.198798 

13.214063 

5.000 

23.175399 

15.186088 

6.000 

25.156913 

17.164780 

7.000 

27.141975 

19.147965 

8.000 

29.129653 

21.134334 

9.000 

31.119311 

23.123046 

10.000 

33.110506 

25.113539 

11.000 

35.102916 

27.105415 

12.000 

37.096305 

29.098390 

13.000 

39.090494 

31.092254 

14.000 

41.085346 

33.086846 

15.000 

43.080753 

35.082042 

16.000 

45.076630 

37.077746 

17.000 

47.072908 

39.073881 

18.000 

49.069531 

41.070385 

19.000 

51.066453 

43.067207 

20.000 

53.063637 

45.064306 

V/V' 
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The  values  obtained  present  a  very  curious  feature:  they  fall  in  what  seems  two 
parallel  straight  lines  with  slope  near  2  and  intercept  equal  to  13.33  for  Lx'  and  5.37  for 
L{  (see  Figure  1).  A  regression  line  was  fitted  to  the  values  in  Table  I  assuming  the  two 
lines  are  indeed  parallel  and  the  regression  equations  are  given  by: 

L,'=  13.33  +  1.98  P;  L{  =  5.37  +  1.98  p 

The  residuals  from  this  regression  are  shown  in  Figure  2.  Figure  2  indicates  the 
true  values  would  not  fall  in  a  straight  line.  Note  the  different  behavior  when  p<i. 
However  over  the  range  considered  the  linear  approximation  might  be  satisfactory  and 
gives  us  a  quick  way  to  estimate  the  value  of  L{  and  without  performing  the  numeri¬ 
cal  integration.  This  may  be  advantageous  when  considering  the  two  step  estimation 
procedure  mentioned  before.  To  check  the  values  obtained  by  the  numerical  integration 
we  performed  a  small  Monte  Carlo  experiment  for  p  =  2, 10,20  by  evaluation  of  the 
corresponding  sample  values  of  10,000  scries  of  sizes  n  =  100,500,1000.  We  also 
evaluated  the  second  moment  of  the  asymptotic  distribution  from  the  representation  of 
the  density  of  the  limiting  LSE  error  in  Bobkoski  (1983).  The  results  are  shown  in 
Table  II.  The  latter  values  are  slightly  smaller  than  the  one  calculated  from  (3.8). 
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Table  II.  Comparison  of  results  for  L,'  and  L{ 


Mont< 

n  =  100 

Carlo  Expei 
n  =500 

iment 

n  =  1000 

17.2663 

17.2655 

16.2126 

16.9297 

* 

(0.4349) 

♦ 

9.3093 

* 

9.3327 

* 

■i 

(0.0735) 

IkBSmI 

* 

P=  10  L{ 

33.1105 

33.1095 

29.5127 

31.1534 

33.4475 

(0.6165) 

(0.6780) 

(0.7743) 

Li 

25.1135 

* 

23.8719 

24.7268 

24.9647 

(0.1026) 

(0.1056) 

(0.1091) 

1 mmm 

53.0636 

53.0627 

51.«"" 

51.3961 

mmm 

(1 

(0.9723) 

45.0643 

* 

4^ . ~ / 96 

44.4285 

mm 

(0.1375) 

Note:  Values  in  parenthesis  are  estimated  standard  errors  for  the  quantity  above. 


The  values  shown  for  p=  10,20  are  obtained  by  integration  on  [-70,  pi.  For  p=2  the 
range  of  integration  is  [-35 ,5.70].  As  for  the  Monte  Carlo  trials,  the  estimated  values  lie 
within  two  standard  deviations  of  the  values  obtained  by  numerical  integration  except 
when  p=20,  where  the  bias  has  not  been  overcome  by  the  increment  of  the  size  of  the 
series.  In  any  case  the  values  are  close  enough  to  support  the  numerical  integration 
results.  Less  bias  and  smaller  estimated  standard  deviation  from  the  simulations  would 
be  ideal  but  unfeasible  since  in  order  to  lower  the  value  of  both  the  bias  and  variance  it 
may  need  more  computer  time  than  what  is  convenient  or  even  allowed  on  the  facilities 


used. 


Now  wc  are  in  the  position  to  calculate  values  of  Q  for  the  score  functions 
ih  ,ti2  and  y.  By  equation  (3.2)  and  the  observation  about  the  relation  between  L,  and  L, ' 
we  have: 

Q  (ii  1)  =  <j2L,  = 


Q  (tii)  =  (o4 1/1)''  [L  x '  -  L  { (l  -  o2 1, )) 


<?(*>- 


Li’-Lt’d-Jlf) 


Note  that  if  //  is  the  information  when  a2  =  1  we  have  that  a2 1,  =  therefore  the 
asymptotic  mean  squared  error  for  the  score  functions  considered  here  does  not  depend 
on  the  variance  of  the  shocks.  Moreover,  it  depends  on  the  probability  density  function 
of  the  shocks  only  through  the  information  rf.  Thus  we  will  set  o  =  1  and  in  this  case 
we  have  /',  2  1  (Rustagi  (1976)).  Consequently 


, ,  G(»h>  l2'-l,’o -//)  ^ , 
eTv7  = - T7 - 21 


(3.10) 


and  a  minimum  is  obtained  when  -  l. 


Q  (hi1  L  L  2  I  r  *  (L  •,  -  L  2  )*  (/  /  -  1 ) 
Q  (V)  '  L  'Lil'j 


(3.11) 


and  a  maximum  is  obtained  when  /',  =  2. 

In  Figure  3  we  exhibit  the  ratio  pin,)  for  the  LSE  for  values  of 

rr  =  (x  3t2. 1  50,  and  2.00  In  Figure  4  the  ratio  (?<rb»  for  the  MLE  is  shown  for  the 


same  values  of  l’f  Note  that  I)  =  (x  corresponds  to  a  logistic  distribution  with  mean 
zero  and  variance  1  From  these  figures  one  can  see  that  the  LSE  can  be  ver> 
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"inefficient"  while  the  ULE  cannot  be  worse  than  20%  "inefficient"  in  the  USE  sense. 

4.  The  large  sample  behavior  of 

In  this  section  we  will  prove  Theorem  2.  First  we  establish  the  joint  limiting  dis¬ 
tribution  of  the  sums  in  (2.4)  as  an  application  of  Theorem  1. 


Theorem  3  :  Consider  the  model  l)  with  initial  value  y„(0)  as  in  the  statement  of 
Theorem  2.  Suppose  that  assumptions  (2.A)  to  (2.C)  hold.  Consider  the  sequence  of 
processes  on  Da.[0,l)  defined  by 


X„(/>= 


.-i 


i«ir 

y.(*-l)V(E(*)) 


n  'X 

4*1 

[all 

n-«Xy.2(*-l) 

*«! 


V(£(*>~1 


(4.1) 


Then  X.  =*  X  as  n  -»•»,  where  X  is  the  continuous  process  on  [0.1]  given  by 


X(/)  = 


Y (i) . |  Y ( s ) dW j< j ) . |  Y\s ) dW ,( j ) 


(4.2) 


where  W  is  the  3 -dimensional  Brownian  Motion  defined  below  equation  (2.1 1)  and  Y  is 
the  Ornstein-Uhlenbeck  process  defined  by  equation  (2.7)  with  initial  condition  having 
the  stationary  distribution 


Proof:  First  of  all  note  that  we  can  represent  W  by 

W<n  =  rb(M  (4.3 1 

where  bn  is  a  3-dimensional  standard  Brownian  Motion  with  covariance  u  1>  and 
r  - 1 7, ,  i  is  the  Cholesks  factor  for  I.  if  r  is  a  lower  triangular  matrix  >uch  that 


rr  =  S.  Now  the  process  X(0  satisfies  the  Stochastic  Differential  Equation: 


-P*i(0 

1 

0 

0 

0 

dt  + 

0 

*1(0 

0 

0 

0 

0 

*?(0 

dX(t)  = 


(set)=  b  (X(t  ))dt  +G(X(t))dW(t) 
=  b(X(t))dt  +G(X(t))rdb(t) 


d  W(/) 


(4.4) 


with  initial  condition  X(0)=(y(0),0,0)'.  The  last  equality  in  (4.4)  follows  by  equation 
(4.3)  and  Itfi’s  formula  (Arnold  (1974)  page  90). 

The  functions  b  and  G  do  not  depend  directly  on  time  and  they  have  continuous 
partial  derivatives  of  first  order  that  are  bounded  on  { I  x  I  £  M  )  for  all  M  >  0.  Conse¬ 
quently  by  Corollary  6.3.3  Arnold  (1974),  equation  (4.4)  has  exactly  one  continuous 
solution.  Moreover  the  process  X(t)  is  a  3-dimensional  diffusion  process  on  [0,1]  with 
drift  vector  b(x)  and  diffusion  matrix  a  (i)  =  C  (i)rrc  '(i)  =  C  (x)  E  C  '(*)  (see  Arnold 
(1974),  theorem  9.3.1,  page  152).  In  this  case  o(x)  equals: 


fl(l)  = 


Oil  o12x,  Oi>*i2 
Oi2*i  O^i2  023*1 

0\3*l  O23X  \  O33X  \ 


(4.5) 


Thus  X(i )  is  a  solution  of  the  associated  martingale  problem  for  the  infinitesimal  opera¬ 
tor  of  the  diffusion,  i.e. 


D 


■  -1  dx, 


1 

2 


•  «1/*1 


a2 

dx,  dij 


(4.6) 


with  initial  measure  equal  to  Law(X( 0)),  which  should  equal  to  the  weak  limit  of 
Lu*.  (X„(0))  to  have  the  appropriate  limiting  distribution.  We  claim  that  Luh  (X(0))  is  the 


20 


a 


i 


3-dimensional  degenerate  normal  Ar(O,0CT2/2P)  where  equals  zero  unless  i=j  =  1. 
Our  claim  follows  from  the  definition  of  X„(0)  and  the  fact  that 


y.(o>=  («-“£(-*)) 

krn 0 


converges  weakly  to  a  random  variable  distributed  as  a  N(0,o2/2P)  by  an  easy  applica¬ 
tion  of  the  Linderberg-Feller  Central  Limit  Theorem  to  the  triangular  array  defined  by 

TkK  =♦.*£(-*)  0 £H«2 

Now,  X.  is  a  solution  of  the  following  stochastic  difference  equation 


AX„(-)  = 
ft 


with  W,  defined  in  equation  (2.10)  so  it  is  natural  to  thing  that  X,  will  approximate  the 
continuous  process  X.  We  proceed  to  prove  this  by  finding  3-dimensional  processes 
B„(0  and  3x3-matrix  valued  processes  A.  (t)  such  that  the  conditions  of  Theorem  1  are 
satisfied.  From  equation  (4.7)  it  follows  that 

-Py.  (kin)' 

6Xn(kln)  =  0  A<  +n~'%H(k  + 1) 

0 

where 

£„(*)=  e(*),n_vb>,(*  -  l)v(e(*»,n'1>’,.2(*  -  l)[y(e(*))- U  '• 

Since  £  [4„(*)/Gt_i}  =  0  the  predictable  compensator  of  X,  is  given  by 


'P*  !.«(*/«)' 

1  0  0 

0 

A/  + 

0  XUm(k/n)  0 

0 

0  0  XUkln) 

m 


rwww 


B„(0=  Z  |£IAX.Ob/*)/c*]j 
f  i«H 

=  -P  I  Y„(k/n) At  ,0,0 


and  writing  X„  (*/«)= AX.  ((*-l)/n)  +  X„((*-l)/n)  one  can  see  that 
M„  (*  In )  =  X,  (*  In  )  -  B.  (*  In )  =  n (k )  +  8.  (k ) 


(4.8) 


(4.9) 


where  8„(Jt)  is  G*_i  measurable.  Thus  one  can  find  A,,  the  compensator  of 
MH(k/n )M, '(k In )  as 


(«) 

A,(t)=  I ^  E [M, (i /n )M'. (*/«)-  M„ ((* - 1  )/n )M'„ ((k - 1  )/n )  /  G*_,  ]  ■ 

k.i  [ 

(4.10) 

=  «"' I*  &(**'.(* 


It  follows  from  the  last  equality  of  (4.11)  that  the  increments  Am(t)-AH(s),  t  >s  of  the 
process  so  defined  are  non-negative  definite. 

What  is  left  now  is  to  verify  the  "continuity"  conditions  (1.8)  to  (1.10)  and  the 
"approximation"  conditions  (1.11)  and  (1.12)  of  Theorem  1.  We  start  by  the  approxima¬ 
tion  conditions.  For  condition  (1.1 1)  we  have  just  to  show 

i  p 

sup  i  fi  J  „(r)  —  f  f>i(X.(j))<is|->  0 
osisi  JQ 


but  the  absolute  value  equals: 


(4.11) 


(<«H 

X  «-Viy,(/:)A/|=  p(/-[n/]/n)iy.(/)l 


**> 


<  4  iy,(oi  *  -^nyjL 

n  fl 

Since  II  y,  IL  is  bounded  in  probability  (Bobkosky  (1983),  page  25)  the  last  quantity 
goes  to  zero  as  n  goes  to  infinity.  Condition  (1.12)  will  also  follow  by  the  same  type  of 
argument  and  the  boundeness  of  II Y* 1 1„  for  q  =0,1 ,2,3,4.  To  prove  the  continuity  con¬ 
ditions  let  x'  be  the  stopping  time  defined  in  Theorem  1.  Thus  for  t  <t*r  we  have 
l  X„  (t )  l  <  r  and  in  particular 

iy*0)l  < r  for  t<x;  (4.12) 

Hence  the  continuity  condition  (1.10)  for  A.  is  easily  verified  when  we  note  that  it 
reduces  to  proving  that 

lim  n-1E  sup  I  y»(([nf  ]— l)/n )  I 

IS  x' 

which  is  obvious  by  (4.12)  since  we  are  evaluating  the  process  at  a  time  point  strictly 

smaller  than  x'.  In  the  same  way,  the  condition  for  B„  reduces  to 

* 

lim  (P/n  )2  E  sup  Y?(([nt  ]- 1  )/n ) 

[/st; 

which  follows  again  by  (4.12). 

Finally  for  the  condition  on  the  X,  process  it  is  sufficient  to  verify 


=  0  for  j  =  1 ,2,3,4. 


(4.13) 


(4.14) 


lim  E 

n~x  sup 

\\k)~  (2p  In )  e  (*  )y„  (k-l)+ (p/n  )2y,2(*-l)l 

=  0 

k  * 

r  .  3 

J 

lim  E 

n~2  sup 

kin  x' 
<  a 

-  l)v(eOk))J  1 

J 

=  0 

lim  E 


n-3  sup 

k  S»T, 


r[y?(k~imz(k))- 


1] 


=  0 


But  each  one  of  those  conditions  hold,  by  (4.12),  Lemma  1  below  and  our  assumption 
on  the  moments  of  e,  y(e),  and  y(e).  Hence  Theorem  1  guarantees  the  weak  conver¬ 
gence  of  X„  to  X.  □ 


Remark  :  In  the  proof  of  Theorem  3  it  is  not  necessary  to  make  the  assumption  that 
y„ (0)  has  the  stationary  distribution.  The  result  will  follow  as  soon  as  YH( 0)  has  a  weak 
limit.  In  particular  the  result  is  true  when  one  assumes  y„(0)  to  be  constant. 


Lemma  I  :  Let  (p(*))Li  be  a  sequence  of  iid  random  variables  with  finite  (1+ Si- 
moment  then 

n~xE  ^max  p(/r)j  — >0  as  n (4.15) 

Proof  :  Let  F  be  the  cdf  of  ti(1).  Define  x(u)  =  inf  (x  -,F(x)<u  ).  By  the  so  called  proba¬ 
bility  integral  transformation  u  =  F(x) 

E  \  max  r|(* )]  =  f  n  x(u)uh~xdu  (4.16) 

Us*s«  J  J0 

To  show  (4.15)  we  use  the  Holder’s  inequality 

i  i 

llj/slisfjl/l']'  (j  '*'*]  * 


with /  =x(u),  g  =n  u  ,p  =  1  +  8,  and  q  =  (i+8)/8  to  obtain 


i 

n-1£  [m  ll(ife)]  Sn-jf  [  lTl(l)l1+iJ]  '+*  n 


8 

(l  +  5)(n-l)  +  5 


8 

1  +8 


8 

=  0(n’1+s)  □ 


The  next  result  proves  the  weak  convergence  of  the  terms  on  the  Taylor  expansion 
in  equation  (2.4)  and  in  particular  the  joint  convergence  of 

c-l  »l-l  1  ' 

C£Y?(kin)At ,  '£Yn(lc/n)AW2'K(.k/n)Y  to  the  random  vector  (J  Y\s)ds  ,J  Y(s)dW2(s)Y. 

Jt=l  *=1  0  0 

Lemma  2  :  Under  model  (1.1)  and  assumptions  (2. A)  to  (2.C)  the  sequence  of  4- 
dimensional  random  vectors 


converges  weakly  to 


Z„  = 


S  Y?(k  In  )Af 

*=i 

XI^3(/t/«)IA/ 

*=i 

t=l 

XYH2(k/n)AW^(k/n) 

*= l 


Z  = 


r  i  i  i  i 

J  y2(s  )dj ,  J  i  y  3(j  )  i  di ,  J  y  (s  yiv2(j ).  J  y2(*  yw  3(s ) 

0  0  0  0 


Proof :  Consider  the  transformation  g  :Cw>[0,l]^£R4  such  that 
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S 


v'vVvV 


e,=  I  IY*W  IW)a(i) 

t-i  t=i 


which  is  bounded  by  l.  Theorem  3  implies  that  r0„  is  Op (n_1)  and  7^  is  O p(n~'*),  while 
Lemma  2  implies  that  73„  is  Op  (n  *)  and  7,,.  converges  weakly  to  a  random  variable, 
which  is  positive  with  probability  1  (This  last  claim  follows  from  the  fact  that  if 
Y  =  0  a.e.  then  necessarily  W  =  0  a.e.  which  is  a  contradiction).  Hence  if  y  is  an  arbi¬ 
trarily  small  positive  number  there  exists  an  N  such  that  for  all  n  >N  there  exist  finite 
positive  constants  M0, ,M2,  Af3  such  that 


p[l7o^l<n_!M0]  >1~| 

>i-J 

>1-J 

/>[l73.J<n*M3]  >1--J 


(4.18) 


thus  with  at  least  probability  1  -  y 


n -2  *P(0  >  -  Mo  n -1  -  M ,  (C -  <t>  „)  -  M  2« '' *  I  (C -  4>  .) I  -  M  3* -  <j>,  )2 


(4.19) 


n-24'(O<M0n-1-Mi(C-<{.,)+Af2/i_^t(C-4>,)l  +Af3n"(  C  -  0-  >2 


Now,  choose  n  large  enough  so  that 


(4.20) 


_J  2 M0  w  f  2A#0]  „]  ^  Mo 

n  4  -T7-M2+  ~n  <  — 


and  for  such  n ,  let 


v5’j.  v ' /Wr/V.* y 
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o  -  +  n 


2M0 


hence  equation  (4.19)  gives: 


'IM  J  K4  1  2 

n-24'(C1)>-M0n-1  +  2A/0n-1-n-w  — -^M2  +  — ^  M3 

AT  1  Af  j 


>  Mo-~y  *~l>  0 


while  (4.20)  gives: 


*~2'P(C2)<Afon~1-2M0n~1  +  n-3'2  — ^-M2  + 

Af ) 


2Af  o  2Af 


r  Moi  . 

<  -M0  +  —  n~l  <0 

.  d 


Thus,  since  ¥(0  is  continuous,  the  equation  ¥(0=0  will,  with  probability  exceeding 
1  -  y,  have  a  root,  ,  between  0  and  0  as  we  wished.  Moreover 


4M0 

I <t>«  ~  I  <  ~~r. —  n  with  probability  1-y 

Mi 


and  consequently  the  proof  of  pan  (a)  is  completed. 
For  pan  (b)  we  just  have  to  write 


«($.  -<>»)  = 


(4.21) 


It  follows  from  the  preceding  discussion  that  converges  in  probability 


52 


to  zero  while,  by  Lemma  2,  (nT0^.T]m)  jointly  converges  to  (j  y(j)<flV2(j)  Y\s  kts  ). 

Thus  the  weak  convergence  of  the  right  hand  side  of  equation  (4.21)  to  the  random  van 
able  in  (2.7)  is  guaranteed  by  a  straightforward  application  of  Slutsky’s  theorem  and 
Theorem  5.1  in  Billingsley  (1968).  □ 


5.  Appendix 

Let  W (0  the  3-dimensional  Brownian  motion  defined  in  Section  2.  As  noted 
before  in  the  proof  of  Theorem  3  we  can  represent  this  process  by 


w(o  =  rb(D 


where  b(0  is  a  3-dimensional  standard  Brownian  Motion  with  covanance  u  I)  and 
r =(y.,)  is  the  Cholesky  factor  for  Z,  i.e  r  is  a  3x3-lower  triangular  matrix  such  that 
rr  =  I.  Using  this  representation  we  can  prove  that  Q(y)  can  be  expressed  as  in  (3.2). 
By  Ito’s  theorem  (Arnold  (1974)  page  90)  we  can  write 


J  K(j)4H,2(j)  =  Y2Jy'(j)<i^ ,(.?)  +  fn\  Y(s]db2(s) 


Note  that  W',  =  yu  fe,  and  consequently  the  process  Y  defined  by  the  SDE  (2.7)  is  indepen 
dent  of  b7  and 


From  (5.1 )  we  have 


1  *  5* ■  * 


G(¥)  =  (Y2.)2£  ~~i -  +  (Y*)2£  - 

|K2(j)di 


♦  2-ft,-fe2  E 


Y{s)dbAs)  \\\Y(fi)db&)\ 


Y\s)ds 


Define  F,=a(b<<).0<,s  zt)  and  f , (1,= o(  <>,(*).  Os  j  s/).  We  claim  that  for  any  £/:;- 
measurable  random  function  MO  we  have: 


E  \h(s)db^s)\  F\])  =0 


£  |/i(i)^^5)  I  £,(1>  =  ^h\s)ds 


This  can  be  proven  by  first  looking  at  £,(1) -measurable  step  functions  and  making  use  of 
the  fact  that  b{  an cU>2  arc  independent.  Then  the  usual  limiting  argument  gives  the  result. 

Consequently,  since  {K(<)0S/£1)  is  £,(l) -measurable  one  obtains  that 

1 

£(|  Y(s)db^s)  |  f,(n)  =  0.  Thus  the  expectation  of  the  cross  product  in  (5.2)  vanishes 


since  j  Y( s )db]{s)  and  J  l'2* s )ds  are  F[ 11  -measurable.  Also 


i  V*  )  I 


Y\s)ds\  E 


-2  rr  i 


i 


i 2 


JrutfM*)  I  f 


=  £ 


KJ(i)<ii 


From  all  this  discussion  £)  reduces  to 


Q  (V)  =  Y21  £ 


|  K(5)<a»i(5) 

i 

fyJ(5)dJ 


Kj(j  M5| 

j 


*  Y21  1  +  Yz2^-2 


Plugging  in  the  values  of  y21  and  y^  into  (5.3)  gives  expression  (3.2) 


jr-i 


Figure  2:  Residuals  from  linear  regression 
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